CoGrOO: a Brazilian-Portuguese Grammar Checker based on the CETENFOLHA Corpus
نویسندگان
چکیده
This paper describes an ongoing Portuguese Language grammar checker project, called CoGrOO1-Corretor Gramatical para OpenOffice (Grammar Checker for OpenOffice), based on CETENFOLHA, a Brazilian Portuguese morphosyntactic annotated Corpus. Two of its features are highlighted: hybrid architecture, mixing rules and statistics; free software project. This project aims at checking grammatical errors such as nominal and verbal agreement, “crase” (the coalescence of preposition “a” (to) + definitive singular determiner “a” yielding “à”), nominal and verbal government and other common errors in Brazilian Portuguese Language. We also present some empirical results based on the implemented techniques.
منابع مشابه
Improving CoGrOO: the Brazilian Portuguese Grammar Checker
This paper highlights the main results obtained in an effort to improve the grammar checker CoGrOO, a hybrid system which initially annotates the text using statistical Natural Language Processing (NLP) techniques, and then apply a rule-based analysis to identify possible grammar errors. The goal was to reduce omissions and false alarms while improving true positives without adding new error ru...
متن کاملBaseline Acoustic Models for Brazilian Portuguese Using CMU Sphinx Tools
Advances in speech processing research rely on the availability of public resources such as corpora, statistical models and baseline systems. In contrast to languages such as English, there are few specific resources for Brazilian Portuguese. This work describes efforts aiming to decrease such gap. Baseline acoustic models for Brazilian Portuguese were built using the CMU Sphinx toolkit and pub...
متن کاملPhonologic Patterns of Brazilian Portuguese: a grapheme to phoneme converter based study
This paper presents Brazilian Portuguese phoneme patterns of distribution, according to an automatic grammar rulesbased grapheme to phoneme converter. The software Nhenhém (Vasilévski, 2008) was used for treating data: written texts which were decoded into phonologic symbols, forming a corpus, and subjected to a statistical analysis. Results support the high level of predictability of Brazilian...
متن کاملSegmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval
The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate any lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which...
متن کاملTowards a Phonetic Brazilian Portuguese Spell Checker
Spell checking is no longer considered a big challenge for natural language processing, at least regarding the task of correcting documents during edition. Nevertheless, without human interaction, it is necessary to automatically choose the word that will more likely correct the misspelled word. Also, there is a further difficulty for spell checking: new types of errors on the web material have...
متن کامل